A two-stage feature selection method for text categorization
نویسندگان
چکیده
منابع مشابه
Two-Stage Feature Selection for Text Classification
In this paper, we focus on feature coverage policies used for feature selection in the text classification domain. Two alternative policies are discussed and compared: corpus-based and class-based selection of features. We make a detailed analysis of pruning and keyword selection by varying the parameters of the policies and obtain the optimal usage patterns. In addition, by combining the optim...
متن کاملCategorical Proportional Difference: A Feature Selection Method for Text Categorization
Supervised text categorization is a machine learning task where a predefined category label is automatically assigned to a previously unlabelled document based upon characteristics of the words contained in the document. Since the number of unique words in a learning task (i.e., the number of features) can be very large, the efficiency and accuracy of the learning task can be increased by using...
متن کاملA Knowledge-Based Feature Selection Method for Text Categorization
A major difficulty of text categorization is the high dimensionality of the original feature space. Feature selection plays an important role in text categorization. Automatic feature selection methods such as document frequency thresholding (DF), information gain (IG), mutual information (MI), and so on are commonly applied in text categorization. Many existing experiments show IG is one of th...
متن کاملA global-ranking local feature selection method for text categorization
In this paper, we propose a filtering method for feature selection called ALOFT (At Least One FeaTure). The proposed method focuses on specific characteristics of text categorization domain. Also, it ensures that every document in the training set is represented by at least one feature and the number of selected features is determined in a data-driven way. We compare the effectiveness of the pr...
متن کاملEnhancement of DTP Feature Selection Method for Text Categorization
This paper studies the structure of vectors obtained by using term selection methods in high-dimensional text collection. We found that the distance to transition point (DTP) method omits commonly occurring terms, which are poor discriminators between documents, but which convey important information about a collection. Experimental results obtained on the Reuters-21578 collection with the k-NN...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Computers & Mathematics with Applications
سال: 2011
ISSN: 0898-1221
DOI: 10.1016/j.camwa.2011.07.045